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Abstract. The goal of this note is to show that Hastings's counterexample to the additivity 
of minimal output von Neumann entropy can be readily deduced from a sharp version of 
Dvoretzky's theorem. 



Introduction 

A fundamental problem in Quantum Information Theory is to determine the capacity 
of a quantum channel to transmit classical information. The seminal Holevo-Schumacher- 
Westmoreland theorem expresses this capacity as a regularization of the so-called Holevo 
X-quantity (which gives the one-shot capacity) over multiple uses of the channel; see, e.g., [Ij. 
This extra step could have been skipped if the x-^uantity had been additive, i.e., if 

(1) x{^(S)^)=x{^) + x{'^) 

for every pair (<^, ^) of quantum channels. It would have then followed that the ^-quantity and 
the capacity coincide, yielding a single-letter formula for the latter. Determining the veracity 
of ([T|) had been a major open problem for at least a decade (we refer, e.g., to the survey [2j). 
A substantial progress was made by Shor [3] who showed that ([1]) was formally equivalent to 
the additivity of the minimal output von Neumann entropy of quantum channels — a much 
more tractable quantity. Using this equivalence, the equality ([1]) was eventually shown to be 
false by Hastings [4J, with appropriate randomly constructed channels as a counterexample. 

In this note, we revisit Hastings's counterexample from the viewpoint of Asymptotic Geo- 
metric Analysis (AG A). This field — originally an offspring of Functional Analysis — aims 
at studying geometric properties of convex bodies (or equivalently, norms) in spaces of high 
(but finite) dimension. More specifically, our goal is to show that (a variant of) Hastings's 
analysis can be rephrased in the language of AGA, and his result deduced with only minor 
effort from a sharp version of Dvoretzky's theorem [5] on almost spherical sections of convex 
bodies — a fundamental result of AGA. This makes the argument much more transparent 
and will hopefully lead to a better understanding of the problem of capacity. Our approach is 
largely inspired by Brandao-Horodecki [6|, who were able to reformulate Hastings's analysis 
in the framework of concentration of measure. 
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Notation 

Throughout the paper, the letters C, c, C , ... denote absolute positive constants, indepen- 
dent of the instance of the problem (most notably of the dimensions involved), whose values 
may change from occurrence to occurrence. The values of these constants can be computed 
by reverse-engineering the argument, but we will not pursue this task. We also use the fol- 
lowing convention: whenever a formula is given for the dimension of a (sub)space, it is tacitly 
understood that one should take the integer part. 

Let Mk,d be the space oi k x d matrices (with complex entries), and = Aid,d- More 
generally, B{'H) will stand for the space of (bounded) linear operators on the Hilbert space Ti. 
We will write || • ||p for the Schatten p-norm \\A\\p = (Tr(^t^)P/2)Vp^ r^Yie limit case II • \\oo IS 
the operator (or "spectral") norm, while || • \\hs = II " II 2 is the Hilbert-Schmidt (or Frobenius) 
norm. Let P(C'^) be the set of density matrices on C^, i.e., positive semi-definite trace one 
operators on (or states on C^). If /? is a state on C^, its von Neumann entropy S{p) is 
defined as S{p) = —Trplogp. If ^ : Aim A4k is a quantum channel (completely positive 
trace preserving map), its minimal output entropy is 

Sr,,U^)= min Smp)). 
Concavity of S implies that the minimum is achieved on a pure state. 

Channels as subspaces 

The crucial insight allowing to relate analysis of quantum channels to high-dimensional 
convex geometry is the observation that there is an essentially one-to-one correspondence 
between channels and linear subspaces of composite Hilbert spaces. Specifically, let W be a 
subspace of C'^ ^ of dimension m. Then <I> : B{W) — )• TWfc defined by ^{p) = TTQd{p) 
is a quantum channel; here Tr^d is the partial trace with respect to the second factor in 
(8) C^. Alternatively, and perhaps more properly, we could identify W with via an 
isometry V : C™' — )• C'^ whose range is W and define, for p E Mm, the corresponding 
channel ^ : Mm Mk hy 

(2) <l>ip) = Trc4VpV^). 

It is now easy to define a natural family of random quantum channels. They will be associ- 
ated, via the above scheme, to random m-dimensional subspaces W of C^, distributed 
according to the Haar measure on the corresponding Grassmann manifold (for some fixed 
positive integers m,d,k that will be specified later). Note that all reasonable parameters of 
a channel defined by ([2]) such as S'min(*^) depend only on the subspace W = V{C'^) and not 
on a particular choice of the isometry V (this will be also obvious from what follows). In 
particular, the language of "random m-dimensional subspaces of C^" is equivalent to 
that of "random isometries from C™ to C'^ (8) C''." 

The additivity conjectures and the main theorem 

The following question has attracted considerable attention in the last few years: if $ and 
^ are two quantum channels, is it true that 

(3) S^i„($ (S)^) = 5„,in($) + 5mm(^') ? 
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Shor [3j showed it to be formally equivalent to a number of central questions in quantum 
information theory, including the additivity of the x-quantity mentioned in the introduction. 

Note that the inequality always holds (consider product input states). However, as 
was first proved by Hastings using random constructions |4j, the reverse inequality is false 
in general. The exegesis of Hastings's argument has subsequently been carried out in [6] 
and [7]. We will show here that the analysis of (a variant of) Hastings's example essentially 
amounts to applying the right version of Dvoretzky's theorem and leads to the conclusion that 
high-dimensional random channels typically violate (j3]). 

Theorem 1. Let k £ 'N , m = ck'^ andd = Ck"^ (c andC being appropriate absolute constants). 
Let V : C" — t- C^iSiC^ be a random isometry and ^ : A4m ^ Mk be the corresponding random 
channel given by ^ . Then for k large enough, with large probability. 



The expression "with large probability" in Theorem [T] and in what follows may be understood 
as "with probability > 9, where 9 G (0, 1) is arbitrary but fixed in advance" (note that, 
in particular, the threshold value of k could then depend on 9). However, much stronger 
assertions are in fact true, for example the probability of the exceptional set in Theorem [1] can 
be majorized by exp(— c'm). Another comment: one only uses in the proof that m and d are 
comparable, and larger than ck'^. 

The proof will be based on separately majorizing 5min(*J' "SD ^*), which is done via a well- 
known and relatively simple trick, and on minorizing 5min(^) = 5*111111(^)1 which is the main 
point of the argument. 

A question analogous to ([3]) can be asked for the minimal output p-Renyi entropy {p > 1). 
For the additivity of Renyi entropy, random counterexamples were constructed earlier by 
Hayden-Winter [8j. It was shown in [9] that the Hayden- Winter analysis can also be simplified 
(at least conceptually) by appealing to Dvoretzky's theorem. Working with the von Neumann 
entropy, however, requires more effort. First, while ^ relied on a straightforward instance 
of Milman's "tangible" version |10| [TT] of Dvoretzky's theorem for Schatten classes that was 
documented in the literature already in the 1970's, we now need a more subtle, sharp version 
(which appears in the literature only implicitly). Second, this sharp version is not applied in 
the most direct way and requires additional preparatory work (for which we mostly follow the 
approach of Brandao-Horodecki [6]). 



Since we are going to consider channels with near-maximal minimal output entropy, the 
following simple inequality (Lemma HI.l in [6], or formula (40) in [3]) will allow to replace the 
analysis of the von Neumann entropy 5 by that of a smoother quantity. 

Lemma 2. For every state a S 'D{C^), 



S„iin($ ^ $) < 5iiiin($) + 5min($). 



Lower bound for S'min(<I>) : the approach 




Consequently, for every quantum channel <I> : Mm Mk, 
(4) S^^{^)^\og{k)-k- max $(p) - 
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It will be convenient to identify C'^ (or, to be more precise, C'^ — a distinction 
we will ignore) with M.k,d via the canonical map induced by n (g) u — )■ |. If x G (g) 
is so identified with a matrix M S A4k d, then 



(5) 



Trcd \x){x\ = MM^. 



Via this identification, Schmidt coefficients of |x) coincide with singular values of M. While 
the tensor and matrix formalisms are equivalent, the matrix formalism is arguably more trans- 
parent, which sometimes leads to simpler arguments. 

Denote by W C C'' C*^ the subspace inducing <I>. Note that the maximum in is 
necessarily attained on pure states which, in this identification, correspond to unit vectors 
X £ yV. For such states the action of <I> is given — in the matrix formalism — by ([5]), and so 
the inequality (jH can be rewritten as 



(6) 



k ■ max 

MeW,\\M\\„s=l 



MM'< 



Id 

1 



HS 



The idea will be to show that, for a random subspace W, the maximum on the right is very 
small; this will be formalized in the next proposition. 



The MAIN PROPOSITION AND THE DERIVATION OF THE MAIN THEOREM 
The heart of the argument is the following proposition 

Proposition 3. There are absolute constants c,C,C' > so that for every k, for d = Ck'^ 
and m = cd, a random Haar- distributed subspace W of dimension m in Aik d satisfies 



(7) max 

MeH',||M||j^s=l 



MMt-- 
k 



HS 



with large probability (tending to 1 when k tends to oo). 



From the proposition one quickly deduces that the pair ($, "!>) is a counterexample to the 
additivity of minimum output von Neumann entropy. Indeed, a straightforward calculation 
shows that applying $ (8> ^ to the maximally entangled state yields an output state with one 
eigenvalue greater than or equal to ^.'^'^^^ ~ M ~ ^ (0' Lemma III. 3; see also section 6 in 
|12)). Then, a simple argument using just concavity of reduces the problem to calculating 
the entropy of the state with one eigenvalue equal to | and all the remaining ones identical, 
which yields 

Smini^<^^) ^21ogA:-^^ + ^ 
On the other hand, equation ([6|) together with Proposition [3] implies 

Since S'min(^) = 5'min('5)i the inequality of Theorem [1] follows if k is large enough, as required. 
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DVORETZKY'S THEOREM : TAKE ONE 

We wish to point out that while Proposition |3] will be derived from a Dvoretzky-like theorem 
for Lipschitz functions (Theorem [4] below), it can be rephrased in the language of the standard 
Dvoretzky's theorem. Indeed, its assertion says that for every M G W with ||M||/f5 = 1 we 
have 



(8) 



MM< - ^4 

k 



^ ^ ,,,,4 2TrMMt Trid ^ ,,^,4 1 

= Tr|M|^ + =Tr|M|^--^0. 



Consequently, 

(9) k-'/^\\M\\Hs ^ \\M\U ^ k-'/'(^l + — ) \\M\\hs ^ k~'/'[l + \\M\\hs 

for all M G W. In other words, W is (1 + (5)-Euclidean, with (5 = when considered as a 
subspace of the normed space (A^fc.d, || " lU); the Schatten 4-class. 

In our prior work [9j we similarly observed that the crucial technical step of the Hayden- 
Winter proof of non-additivity of p-Renyi entropy for p > 1 can be restated as an instance 
of Dvoretzky's theorem for the Schatten 2p-class. There is an important difference, however. 
While in the case of p-Renyi entropy the needed Dvoretzky-type statement was known since 
the 1970s, for the statement of the type ([9]) needed in the present context, the "off the shelf" 
methods seem to yield only 5 = 0{k~^f^) as opposed io 5 = 0{k~^) above. This also suggests 
that while for the p-Renyi entropy derandomization of the example — i.e., supplying explicit 
channels for which the additivity fails — may be a feasible project (see section IX in [9j and ref- 
erences therein), the analogous task for the von Neumann entropy is likely to be much harder. 

Dvoretzky's theorem : take two 

We use the following definitions: if / is a function from a metric space {X, d) to R, and 
/Li G R, the oscillation of / around on a subset A d X \s 

osc{f,A,fi) = sup| / - 
A 

A function / defined on the unit sphere is called circled if /(e*^x) = f{x) for any x G 
Sc" ,9€ [0, 2tt] . If X is a real random variable, we will say that /i is a central value of X if 
H is either the mean of X, or any number between the 1st and the 3rd quartile of X (i.e., if 
min{P(X ^ /i), P(X ^ /i)} ^ this happens in particular if /i is the median of X). 

We will need the following variant of Milman's "tangible" version of Dvoretzky's theorem. 

Theorem 4 (Dvoretzky's theorem for Lipschitz functions). // / : Sc^ — t- R is a 1-Lipschitz 
circled function, then for every e > 0, if E C C" is a random subspace (Haar- distributed) of 
dimension cque^ , we have with large probability 

osc(/, S'c" n£',^) ^ e, 

where fi is any central value of f (with respect to the normalized Lebesgue measure on Sc" ) and 
Co is an absolute constant. If the function is L-Lipschitz, the dimension changes to Cf^niejV)^ . 

A striking application of the theorem above is to the case when / is the gauge function of 
a convex body, or a norm: it leads to the fact that any high-dimensional convex body has 
almost spherical sections. 
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At the heart of Dvoretzky-Hke phenomena hes the concentration of measure, which in our 
framework is expressed by 

Lemma 5 (Levy's lemma [13]). // / : S^~^ — ?• R is a 1-Lipschitz function, then for every 
e>0, 

P(|/(x) - ^1 > e) ^ Ci exp(-cine2), 
where x is uniformly distributed on S''^~^ , fj, is any central value of f , and Ci,ci > are 
absolute constants. 

Results such as Theorem U or Levy's lemma are usually stated with n equal to the median 
or the mean of /. However, once we know that the result is true for some central value (or, 
for that matter, for any G R), it holds a posteriori for any such value (up to changes in the 
constants) as, for 1-Lipschitz functions, all central values differ at most by C j ^Jn. 

The obvious idea to prove Theorem |3]is to use Levy's lemma and an e-net argument — using 
the fact that an e-net in Sc" = 5"^""^ can be chosen to have cardinality ^ (1 -|- 2/e)^" (see 
|14| . Lemma 4.10). Indeed, this was essentially Milman's original argument in |10) . However, 
one only obtains this way a subspace E of dimension cne^/ log(l/e). For many applications 
(including our previous paper [9j), this extra logarithmic factor is not an issue. However, in 
the present case, having the optimal dependence on e is crucial. 

The classical framework of convex geometry is the real case (with or without the assumption 
"circled," which in that context just means then that the function is even). In that setting, 
Theorem[3]was proved by Gordon [15] who used comparison inequalities for Gaussian processes. 
A proof based on concentration of measure was later given by Schechtman [16j. The complex 
case does not seem to appear in the literature. Actually, at the face of it, Gordon's proof does 
not extend to the complex setting, while Schechtman's proof does. We sketch Schechtman's 
proof of Theorem |4] in Appendix A. It is not clear whether the assumption "/ circled" in 
Theorem [J] can be completely removed; we do know that it is needed at most for very small 
values of e. 



Proof of the main proposition 



MM^ 



HS 



Let Shs be the Hilbert-Schmidt sphere in Aik,d and let M be a random matrix uniformly 
distributed on Shs- Let g{-) be the function defined on Shs by 

Id 

T 

The next well-known lemma asserts that the singular values of a very rectangular random 
matrix are very concentrated. This is a familiar phenomenon in random matrix theory that 
goes back to [17]. Versions of this lemma appeared in the QIT literature under the tensor 
formalism (see for example Lemma HI. 4 in [18]). However, these versions typically introduce 
an unnecessary logarithmic factor which would imply that the main proposition holds with 
d = Ck'^logk instead of d = Ck"^. For completeness, we include a proof of Lemma [6] in 
Appendix B. 

Lemma 6. There exist absolute constants C,c > such that, if M is uniformly distributed on 
the Hilbert-Schmidt sphere in A4k,d (d ^ C^k), then with probability larger than 1 — exp(— cA;), 

2/1 ^ \ 2' 



(10) 



spec(MMl') C 



1 

71 



c_ 



Vk Vd 
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We note that inclusion (iTOl) can be reformulated as follows: all singular values of M differ 
from by less than C/^/d. (Recall that the singular values of M correspond to the 

Schmidt coefficients of a random pure state in ® C^.) 

We will use in the sequel the following immediate corollary of Lemma |6l 
Corollary 7. Under the hypotheses of Lemma\^ and denoting Cq = 3C 

(a) with probability larger than 1 — exp(— c/c), all eigenvalues of MM^ differ from 1/k by less 
than Cq / ^/kd; consequently, the median ( or any fixed quantile ) of g is bounded by Cq / Vd for 
k large enough. 

(b) if d ^ C'^k, the median (or any fixed quantile) of ||M||oo is bounded by 2/\/T: for k large 
enough. 

We point out that while we chose to present statements (a) and (b) above as consequences 
of Lemma [6] for clarity and for "cultural" reasons (the lemma being familiar to the QIT com- 
munity), more precise versions of these statements are available in (or can be readily deduced 
from) the random matrix literature. Re (a), the study of the distribution of g is, by ([8]), equiv- 
alent to that of Tr |M|^, and a closed formula for the expected value of the latter is known (up 
to terms of smaller order, its value is 1/k + 1/d); see, e.g., [19] (section 8) and its references. 
Re (b), sharp estimates on the tail of ||M||oo can also be found in [19] (proof of Lemma 7.3), 
in particular every fixed quantile is 1/^/k + l/^/d up to terms of smaller order. This result 
can also be retrieved via methods of earlier papers \20\ I21|. which focused on the real case. 

The function g is 2-Lipschitz on ShSj and Corollary El^a) implies that the median of g is 
as small as we want for large d. However, a direct application of Theorem |4] yields only a 
bound of order 1/^/k in ([7]). The trick — already present in the previous approaches — is 
to exploit the fact that g has a much smaller Lipschitz constant when restricted to a certain 
large subset of Shs- As we will see, this bootstrapping argument is equivalent to applying 
Theorem H] twice. 

The following lemma appears in [6j with a rather long proof, but using the matrix formalism 
completely demystifies it. 

Lemma 8. The function g is 6/\/%-Lipschitz when restricted to the set 

n = {M £ Shs s.t ||M||oo ^ 3/Vk}. 
Proof. The lemma is a consequence of the following chain of matrix inequalities 



MMt-- 






k 


HS 


k 



HS 



1 1 MMt - AT Art I 



HS 



|M(M"f - N^) + (M - N)N^\hs 
|M|U||Mt-iVt||^5 + ||M-A^|| HS 



\M\ 



+ \\N\ 



\M -N\ 



HS 



□ 



The function || • ||oo is 1-Lipschitz on Shs- By Corollary [Tl^b), its median is bounded by 
1l\fk for d ^ C'^k. (Note that Levy's lemma shows that the measure of the complement of 
is very small.) An application of the standard Dvoretzky's theorem (i.e., Theorem [J] for norms) 
to / = II • lloo with equal to the median of || • ||oo and with e = Xj^fk (note that the dimension 
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of the ambient space is n = kd) shows that the intersection of Shs with a random subspace 
of dimension cd in A4k,d is contained in Q with large probabihty. 

Let g be a 6A;^^/^-Lipschitz extension of g\fi to Shs — in any metric space X, it is possible 
to extend any L-Lipschitz function h defined on a subset Y without increasing the Lipschitz 
constant; use, e.g., the formula 

h{x) = mi h{y) +Ldist(x,y) 

This formula also guarantees that the extended function g is circled. Since g = g on most of 
ShSj the median of g (resp., g) is a central value of g (resp., g). We apply Theorem!?] to 9 
with e = 1/k and L = 6/c~^/^ to get being the median of g) 

osc{g,SHS ^E,fi) ^ l/k. 

on a random subspace E C A4k,d of dimension m = cq ■ kd ■ (fc-7(6A:-i/2))2 ^ Using 
Corollary [Tl^a), we obtain that /i ^ l/k for d ^ {C^k)'^ . We then have 

osc{g,SHsr\E,Q) ^ 2/k. 

If Shs ^ E d Vt (which, as noticed before, holds with large probability), g and g coincide 
on Shs n E and therefore osc{g, Shs H 0) ^ 2/k. This completes the proof of Proposition 
[3] and hence that of Theorem [T] 
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Appendix A : Proof of Theorem [3] (apres Schechtman) 

We sketch here a proof of Theorem HI essentially following Schechtman |16| . As we already 
mentioned, a simple use of a e-net argument gives a parasitic factor Iog(l/e). This can be 
improved by a chaining argument, which goes back (at least) to Kolmogorov — a way to use 
r^-nets for all values of r/ simultaneously. 

Consider the canonical inclusion C™ C C", and let U G U{n) be a random Haar-distributed 
unitary matrix. Then F := [/(C"*) is distributed according to the Haar measure on the 
Grassmann manifold of m-dimensional subspaces. If / : — )■ R is a 1-Lipschitz circled 
function with mean //, we need to show that osc(/ o U, Sc"^, fi) ^ e with large probability 
provided m ^ cque'^. We first prove a lemma. 

Lemma 9. Let f : Sc" — ?• R 6e a 1-Lipschitz circled function and U G l/({n) be a Haar- 
distributed random unitary matrix. Then for any x, y G Sc^i with x ^ y and for any A > 0, 

P{\f{Ux) - f{Uy)\ > A) ^ Cexp ( -cn^^-^) 

Proof. Fix x,y £ Sc"- Since / is circled (and U is C-linear), we may replace y by e*^y and 
choose 6 so that {x\y) is real nonnegative; note that this choice of 9 minimizes \x — y\ and 
assures that x -\- y and y — x are orthogonal. (This is the only really new point needed to 
acommodate the complex setting.) Set z = ^-^^ and w = then x = z -\-w and y = z — w. 
Further, set ^ = \w\ = ^\x — y\ (we may assume that (3^0) and w' = (3~^w. Then, 
conditionally on u = U{z), U{w') is distributed uniformly on the sphere S^± := Sc" H u-^. 
Since U{x) = u-\- l3U{w') and U{y) = u — f3U{w'), it follows that the conditional (on u = U{z)) 
distribution of f{Ux) — f{Uy) is the same as that of /„ : S'^x — )• R defined by 



fu{v) = f{u + f]v)-fiu-(3v)- 
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As is readily seen, /„ is 2/3-Lipschitz and its mean is 0. From Levy's lemma, applied to fu and 
to the (2n — 3)-dimensional sphere S^_l, we deduce that, conditionally on u = U{z), 

P{\f{Ux) - f{Uy)\ >\)^Ci exp(-ci(2n - 2)\^ /\x - yp), 

and hence the same inequality holds also without the conditioning. □ 

The end of the proof (the actual chaining argument) is identical to that in Schechtman's 
paper, so — rather than copying it — we present the general principle on which it is based. 
Let (5, p) be a compact metric space and let (Xs)^^^ be a family of mean random variables 
(a stochastic process indexed by 5). We say that (yXg) is subgaussian if there are j4, a > 
such that, for all s, t G S* with s ^ t and for all A ^ 0, 

/ \2 

(11) P(|X^-Xt|^A)^Aexp -a- 



p{s,tf 

Proposition 10 (Dudley's inequality). // {Xs)s&s satisfies (llip and some mild regularity 

conditions, ^oo 

E sup \Xs - Xt\ ^ C'Aa~^l^ / ^/\ogN{S, r?) dry. 
s,ie5 Jo 

where N{S,r]) is the minimal cardinality of a rj-net of S (in particular the integrand is if rj 
is larger than the radius of S). 

See [22j for the original article, [23j for a generalization to the subgaussian case that is 
relevant here, and [24J for a book exposition; we also sketch a proof further below for the 
reader's convenience. 

In our case we choose S = ^c™ U {0} (with the usual Euclidean metric), Xg = f{Us) — p 
if s G 5*0™ and Xq = ; then 

osc(/ o [/, Sc™, /u) = sup \ Xs\. 

The underlying probability space is U{n), and the subgaussian property is given by Lemma |9] 
if s,t G Sc^n and directly by Levy's lemma if s or t equals 0. Next, the bound A^(S'c™,ry) = 
jY^^2m-i^ ^■j ^ (l+2/r7)^'" mentioned in the comments following Lemma[5]leads to an estimate 
2y/m. for the integral and to the bound 

E ■.= 'Esup\Xs\ ^Esup \Xs-Xt\ s^C'C{cn)-^/'^ -2^/^ = 0" J—. 

s&s s,t€S V n 

(For readers confused by different quantities appearing on the left side in different forms of 
Dudley's inequality, we point out that the first inequality above uses the fact that one of the 
variables Xf equals 0, and that we always have sup^ ^ \Xg — Xt\ = sup^, Xg + supj(— Xj).) The 
assertion of Theorem [4] follows now from Markov's inequality if e is sufficiently larger than 
E, which is assured by choosing cq small enough. A slightly more careful argument (such as 
that given in [16], or see [24J) or an application of the appropriate concentration inequality 
(for functions on lA{n)) yields a bound of the form exp(— c'e'^rz) on the probability of the 
exceptional set sup^g^ \^s\ > C"'y^ + £ (hence for the exceptional set from Theorem |4]). 

Let us comment here that the value of the constant cq given by the proof of Theorem U] is 
probably the single most important obstacle to showing Theorem [1] for "reasonable" values of 
/c, m. An adaptation of the proof from [15] (which yields good constants) to the complex case 
could be helpful here. 
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Proof of Dudley's inequality. For every A; G Z, let c/ffc be a 2~^-net of minimal cardinality for 
{S,p). Let ko £ Z such that the radius of 5 lies between 2"('^o+^) and 2^^°; the net 
consists of a single element sq. For every s G S and k G Z, let TTk{s) be an element of ryVk 
satisfying p{s,TTk{s)) ^ 2~^. The chaining equation reads for every s S S 

(12) = + ^ ^7rfc+i(s) - -^^TTfcCs)- 

(It is here where some regularity of (Xg) - path continuity - is used.) It follows that 

(13) sup \Xs -Xt\i^2^ sup\X^^^^^s) -X^^^^)\ ^ 2 ^ sup|X„ 

where the last supremum is taken over couples (n, u') G ^k+i x ^k satisfying p{u, u') ^ 
2~^ + 2^^*"'+^^ < 2^^^^ . It remains to bound the expectation of each term in the sum, using 
the following fact 

Fact 11. If N ^ 2 and Yi, . . . , Y/v nonnegative random variables satisfying the tail estimate 
PiYi ^ t) ^ Aexp{-t^/2(3^) for all t ^ 0, then 



E max Y, < CA(3 y/log N. 

To bound Esup|X„ — we apply the above fact with /3 = 2~^~^^a~^/'^ and N 

card(^) • card(^.+i) ^ iV(5, 2-(^+i))2. This gives 



E sup \Xs - Xt\ ^ C'Aa-^/^ ^ 2'^ log N{S, 2- 



k^ko 



The result now follows by relating the last series to the integral in Proposition [TO] (a version 
of the integral test from calculus). □ 

Proof of Fact [Tl\ We may assume /3 = 1 by working with Yi//3. Then simply write 

poo poo 

EmaxYi = / P(maxyi ^ t)dt ^ a/2 log N + AN / expi-t"^ /2)dt ^ ^2 log N + A. 

Jo Jy'2 logAf 

poo poo 

The last inequality follows from / exp{-t'^ /2)dt / t exp(-t2/2)(it = 1/iV. Note 

V2 log N Jy'2 logAf 

that the hypotheses force A ^ 1. □ 

Appendix B : Proof of lemma [6] 
The lemma will follow if we show that with large probability, 

C 



lAlU ^ 



Vkd 

where A = MM"^ — Id/k E A4k and || • ||oo is the operator (or spectral) norm. Let ^ be a 
|-net of S(jk with cardinality bounded by (Co)''. One checks that if x G S^k and x e ^/K 
satisfy |a; — x| ^ 1/4, then 

|(x|A|x)| ^ Kx|A|x)| + \{x - x\A\x)\ + \{x\A\x - x)\ ^ Kx|A|x)| + 2 • ^||A||oo, 
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SO that taking supremum over x G S^k , we get 

||A||oo ^ 2 sup |(x|A|x)| . 

An application of the union bound gives 

, (C„)'.p(,M.x„|,-L + _^) 

where xq G C*^ is any fixed unit vector (remember that d ^ C'^k). The probabihties above 
can be expressed in terms of Beta-type integrals, but it's easier to estimate them using Levy's 
lemma. The function M i->- |M^a;o| is 1-Lipschitz on the Hilbert-Schmidt sphere (if xq is the 
first vector of the canonical basis, then M^xo is essentially the first row of M) and 

Hence, by Levy's lemma (with n = 2kd and £ = ■^^), we get 

P (^||A||oo ^ ^) ^ exp(-cA;) 
for some choice of the constants C, c > 0, as required. 
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